MIN-MAX  BIAS  ROBUST  REGRESSION 


by 


R.  D.  Martin 
V.  J.  Yohai 
R.  H.  Zamar 


TECHNICAL  REPORT  No.  112 
August,  1987 


Department  of  Statistics,  GN-22 
University  of  Washington 
Seattle,  Washington  98195  USA 


DTIC 

ELECTEj 
FE8  as  i9S8l 

I 

H 


security  classification  of  This  oaGE  ,'Wh*n  D«fa  Entmr9<fi 


REPORT  DOCUMENTATION  PAGE 


A.  Title  .'•nd  5u6(/f/#; 

:iIX->L-\X  BIAS  ROBUST  REGRESSION' 


*U  T^CRr’ij 

R.D.  Martin,  V.J.  Yohai  and  R.H.  Zamar 


READ  INSTRUCTIONS 
BEFORE  COMPLETE^G  FORM 


.  RECIPI  En  T’S  C  AT  auOG  mumSER 


5.  ’■YPE  OF  REPORT  1  PERIOD  COVERED 

TR  12/1/83  -  5/31/88 


.  performing  OHG.  report  NUM8ER 


3.  RERRORminG  Q  RG  an  I  ;  a  "ICN  name  and  address 

Department  of  Statistics,  GN-22 

1  10.  3RCGRAM  ClE.ME.sT,  RPC-EC'  TAS< 

1  AREA  A  >»CR<  -NiT  NUMBERS 

University  of  Washington 

Seattle,  WA  98195  1 

NR-661-003 

o.  cdntrol-ing  office  name  and  aocress 

12.  report  cats 

ONR  Code  N63374 

October  1987 

1107  NE  45th  Street 

13.  number  of  rages 

Seattle,  W’A  98105  “ 

38 

nCV  name  9  ACCRE33('(/  dlltat»ni  from  Conrroilin^ 

15.  security  class,  (ot  :hia  raport) 

Unclassified 

15a.  OECL  aSSI  Fi  C  ATI  ON/  DCWNGRAOlNG 

schedule  I 

'6.  DISTRiauTION  5TATEMEN  T  fol  Report; 

APPROVED  FOR  PUBLIC  RELEASE:  DISTRIBUTION  UNLIMITED. 


'7.  CiSTPiSuTiCN  statement  (cf  th4  tb0  tract  antarad  in  Block  20,  ti  Cl  I  tar  ant  Irotn  Raport) 


<£y  'Ccnfinu#  on  ravarta  aida  t(  nacaammrr  an<i  idantlfy  by  block  ninrOarj 


20.  A?S'  RAC''  fConrfnu*  on  ravarta  tioa  t(  nacaaa^rj  and  Idantity  by  block  nianOar) 

This  paper  considers  the  problem  of  minimizing  the  maximum  asymptotic  bias  of  regression 
estimates  over  e -contamination  neighborhoods  for  the  joint  distribution  of  the  rcsp<  rse  and 
carriers.  Two  classes  of  estimates  are  treated;  (1)  M-estimates  with  bounded  funenor.  p 

. '  roN  ;  ;  'J'l.  ■  - 


DD  1  jIn^73  1473  coition  of  1  MOV  95  IS  OBSOLETE 
S  N  0  :  3r*  L='  0!  4-  660  ] 


SeCM#«l*r  C;_AS5I^'CA'.CH  DF  '-.,s  a  ^ 


applied  to  uhe  scaled  residuals,  using  a  very  general  class  of  scale  esumates,  and  (2) 
Bounded  influence  function  type  generalized  M-estimates.  Estimates  in  the  first  class  are 
obtained  as  the  solution  of  a  minimization  problem,  while  estimates  in  the  second  class  are 
specified  by  an  estimating  equation.  The  first  class  of  M-estimates  is  sufficiently  general  to 
include  both  Huber  "Proposal  2"  simultaneous  estimates  of  regression  coefficients  and 
residuals  scale,  and  Rousseeuw-Yohai  "S-estimates"  of  regression  [Robust  and  Nonlinear 
Time  Series  (19S4):  256-272].  It  is  shown  than  an  S-estimate  based  on  a  jump-function  type 
p  solves  the  min-max  bias  problem  for  the  class  of  M-estimates  with  very  general  scale. 
This  estimate  is  obtained  by  the  minimization  of  the  a-quantile  of  the  squared  residuals, 
where  a  =  a(e)  depends  on  the  fraction  of  contamination  e.  When  e  — »  .5,  a(e)— >  .5 
and  the  min-max  estimator  approaches  the  least  median  of  squared  residuals  estimator 
introduced  by  Rousseeuw  [J.  Am.  Statist.  Assoc.,  79].’  For  the  bounded  iniiuence  class  of 
GM-escimates,  it  is  shown  che  a  "sign"  type  nonlinearity  yields  the  min-max  estimate.  This 
estimate  coincides  with  the  minimum  gross-error  sensitivity  GM-estimate.  For  p  =  1  ,  the 
optimal  GM-estimate  is  optimal  among  the  class  of  all  equivariant  regression  estimates.  The 
mdn-max  S-estimator  has  a  breakdown  point  which  is  independent  of  the  number  of  carriers 
p  and  tends  to  .  5  as  e  increases  to  .  5  ,  but  has  a  slow  rate  of  convergence.  The  min-max 
G.M-estimate  has  the  usual  rate  of  convergence,  but  a  breakdown  point  which  decreases  to 
zero  with  increasing  p  .  Finally,  we  compare  the  min-max  biases  for  both  types  of 
estimates,  for  the  case  where  the  nominal  model  is  multivariate  normal. 
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MIN-MAX  BIAS  ROBUST  REGRESSION 

ABSTRACT 

This  paper  considers  the  problem  of  minimizing  the  maximum  p  ymptouc  bias  of  regression 
estimates  over  e -contamination  neighborhoods  for  the  joint  di  nhuuon  of  the  response  and 
carriers.  Two  classes  of  estimates  are  treated;  (1)  M-estimau  s  with  bounded  function  p 
applied  to  the  scaled  residuals,  using  a  very  general  class  of  scale  estimates,  and  (2) 
Bounded  influence  function  type  generalized  M-esdmates.  Estimates  in  the  first  class  arc 
obtained  as  the  solution  of  a  minimization  problem,  while  estimates  in  the  second  class  are 
specified  by  an  estimating  equation.  The  first  class  of  M-estimates  is  sufficiently  general  to 
include  both  Huber  "Proposal  2"  simultaneous  estimates  of  regression  coefficients  and 
residuals  scale,  and  Rousseeuw-Yohai  "S-estimates"  of  regression  [Robust  and  Nonlinear 
Time  Series  ( 1984):  256-272].  It  is  shown  than  an  S-cstimate  based  on  a  jump-function  type 
p  solves  the  min-max  bias  problem  for  the  class  of  M-cstimates  with  very  general  scale. 
This  estimate  is  obtained  by  the  minimization  of  the  a-quantile  of  the  squared  residuals, 
where  a  =  a(e)  depends  on  the  fraction  of  contamination  e  .  When  e  .5 ,  a(E)  .  5 
and  the  min-max  estimator  approaches  the  least  median  of  squared  residuals  estimator 
introduced  by  Rousseeuw  [/.  Am.  Statist.  Assoc.,  79].  For  the  bounded  influence  class  of 
GM-estimates,  it  is  shown  the  a  "sign"  type  nonlinearity  yields  the  min-max  estimate.  This 
estimate  coincides  with  the  minimum  gross-error  sensitivity  GM-estimate.  For  p  =  1 ,  the 
optimal  GM-estimate  is  optimal  among  the  class  of  all  equivariant  regression  estimates.  The 
min-max  S-estimator  has  a  breakdown  point  which  is  independent  of  the  number  of  carriers 
p  and  tends  to  .5  as  e  increases  to  .  5 ,  but  has  a  slow  rate  of  convergence.  The  min-max 
GM-estimate  has  the  usual  rate  of  convergence,  but  a  breakdown  point  which  decreases  to 
zero  with  increasing  p  .  Finally,  we  compare  the  min-max  biases  for  both  types  of 
estimates,  for  the  case  where  the  nominal  model  is  multivariate  normal.  a  .  .  i 
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1.  INTRODUCTION 


In  spite  of  the  considerable  existing  literature  on  robustness,  there  is  relatively  little 
published  work  on  global  robustness.  Huber’s  (1964)  min-max  variance  approach  is  based 
on  neighborhoods  which  are  not  global  by  virtue  of  excluding  asymmetric  distributions.  The 
shrinking  neighborhood  approach  introduced  by  Jaeckel  (1972),  and  used  also  by  Bickel 
(1984)  and  Beran  (1977a,  1977b),  among  others,  attempts  to  deal  with  asymmetry  by  putting 
bias  on  the  same  asymptotic  footing  as  variance.  But,  the  shrinking  neighborhood  approach 
could  hardly  be  called  global.  Approaches  based  on  the  influence  curve,  such  as  optimal 
bounded  influence  regression  (Hampel,  1974;  Krasker,  1980;  Krasker  and  Welsch,  1982; 
Huber,  1983)  inherit  the  local  or  infinitesimal  aspect  of  the  influence  curve  itself. 

It  seems  that  the  main  global  approach  to  robustness  in  recent  years  has  been  centered 
around  the  construction  of  high  breakdown  point  estimates,  particulariy  for  multivariate 
problems  where  this  approach  presents  real  challenges.  See  for  example:  Donoho  (1982), 
Donoho  and  Huber  (1983),  Stahel  (1981),  Rousseeuw  (1982),  Rousseeuw  and  Yohai  (1984), 
Yohai  (1987),  Yohai  and  Zamar  (1986).  In  the  latter  two  papers,  the  authors  construct 
regression  estimators  which  have  both  high  breakdown  points  and  high  efficiency. 

The  bt  .akdown  point  approach  is  highly  attractive  for  a  number  of  reasons,  not  the  least 
of  which  is  the  transparency  of  the  concept  and  the  ease  with  which  it  can  be  conrununicated 
to  applied  statisticians  and  scientists.  On  the  other  hand,  one  nonetheless  wishes  to  have 
global  optimality  theory  of  robustness  which  emphasizes  bias  control  for  fictions  of 
contamination  smaller  than  the  breakdown  point  Furthermore,  bias  is  itself  a  very 
transparent  concept 

Along  these  lines  we  recall  that  Huber  (1964)  established  the  following  result  in  his  by 
now  classic  paper:  The  sample  median  minimizes  the  maximum  asymptotic  bias  among  all 
translation  equivariant  estimators  of  location,  the  maximum  being  over  epsilon  contaminated 
distributions  (and  also  Levy  neighborhoods).  It  seems  that  this  approach  to  global 


robustness,  namely  the  construction  of  min-max  bias  robust  estimators  has  been  essentially 
neglected  until  quite  recently,  and  this  problem  is  quite  clearly  articulated  in  Hampel  cl  al. 
(1986)  (sec  lower  left  entry  of  Table  2,  p.  176).  Among  the  recent  work  in  this  area,  we 
know  of  the  following  as  yet  unpublished  papers:  Donoho  and  Liu  (1985),  who  establish 
attractive  bias  robustness  properties  of  minimum  distance  estimators;  Martin  and  Zamar 
(1987a),  who  obtain  min  max  bias  robust  estimates  of  scale;  and  Martin  and  Zamar  (1987b), 
who  construct  min-max  bias  robust  estimates  of  location,  subject  to  an  efficiency  constraint 
at  the  nominal  model.  See  also,  Zamar  (1985)  for  min-max  bias  orthogonal  regression  M- 
estimates. 

In  this  paper,  we  construct  min-max  bias  robust  regression  estimates  for  two  different 
classes  of  estimates:  (i)  M-estiniates  based  on  bounded  p  functions  and  general  scale  (i.e., 
general  scale  estimate  fev  residuals),  and  (ii)  GM-estimates  having  bounded  influence 
curves.  In  the  first  case,  the  estimates  are  defined  by  a  minimization  problem,  wnereas  in  the 
second  case  the  estimates  are  defined  by  an  estinuiting  equation. 

It  turns  out  that  S-estimators  introduced  by  Roussceuw  and  Yohai  (1984),  can  be 
regarded  as  special  cases  of  M-estimates  with  general  scale,  as  can  Huber  "proposal  2"  M- 
estimates  for  regression  and  residuals  scale.  In  fact,  our  min-max  bias  M-estimate  is  just 
that,  an  S-estimate. 

The  paper  is  organized  in  the  following  way.  Section  2  introduces  epsilon- 
contaminated  model  for  regression,  M-estimates  of  scale  based  on  bounded,  symmetric 
functions  p ,  and  the  related  S-estimates  for  regression.  Section  3  establishes  an  expression 
for  the  maximum  bias  of  an  S-cstimatc.  We  also  display  the  special  form  this  expression 
takes  for  nominal  multivariate  normal  models,  and  also  the  special  form  obtained  for  jump 
functions  p^  ,  which  take  on  the  values  0-1,  with  jumps  at  ±c  .  Section  4  introduces  the 
class  of  M-estinutes  with  general  scale,  constructs  a  lower  bound  for  the  maximum  bias 
for  fixed  p ,  and  a  lower  bound  A*  for  A p  as  p  ranges  over  a  broad  class  of  loss 


functions.  It  is  then  shown  that  an  S-estimate  achieves  A*  .  Section  5  constructs  min-max 
bias  GM-estiinates.  These  estimates  are  based  on  a  "sign"  function  type  nonlinearity  in  the 
estimating  equations,  which  corresponds  to  a  weighted  LI  regression,  with  weights  inversely 
proportional  to  the  norm  of  the  vector  of  carriers.  Throughout  Sections  2-5,  we  have  for 
simplicity  considered  the  case  where  the  intercept  is  known.  In  Section  6  we  indicate  how 
our  results  may  be  extended  to  the  case  when  the  intercept  is  unknown  and  must  be  estimated 
along  with  the  slope  parameters.  Finally  Section  7  provides  a  comparison  of  the  biases  of 
min-max  S-estimates  and  GM-estimates  for  the  case  where  the  nominal  model  is 
multivariate  normal. 
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2.  GENERAL  SETUP  AND  S-ESTIMATES 

2.1  The  target  model  and  maxi  num  asymptotic  bias. 

We  assume  the  target  model  is  the  linear  model 

y  =  x'Qq  +  u 

where  x=(x^,X2, . . . ,  x^)'  is  a  random  vector  in  6o  =  (6io»  •  ■  ■  - 
true  regression  parameters,  and  the  error  m  is  a  random  variable  independent  of  x .  Let  Fq 
be  the  nominal  distribution  function  of  u  and  Gq  the  nominal  distribution  function  of  x . 
Then  the  nominal  distribution  function  Hq  of  (y,x)  is 

//o(y,x)  =  Go(x)Fo(y-x'eo)  (2.1) 

We  assume  that  Gq  is  elliptical  about  the  origin,  with  scatter  matrix  A.  Correspondingly, 
we  work  with  zero  intercept  until  Section  6,  which  discusses  how  our  results  can  be  extended 
to  deal  with  an  intercept 

Let  T  be  an  R^  valued  functional  defined  on  a  ("large")  subset  of  the  space  of 
distribution  functions  H  on  R**  ^ .  This  subset  is  assumed  to  include  all  empirical 
distribution  functions  //„  corresponding  to  a  sample  (yj ,  x), . . . ,  (y„  ,  x„ )  of  size  n 
from  H  .  Then  T^  =  T(  //^  )  is  an  estimate  of  Sq  • 

It  is  further  assumed  that  T  is  regression  equivariant,  i.e.,  if  y=y+x'b  and 
x  =  C^x  for  some  full  rank  p  xp  matrix  C,then  T(//)  =  C"’[T(//)  +  b] ,  where  H  is 
the  distribution  of  ( y ,  x) .  Correspondingly,  the  transformed  model  parameter  is 
eo  =  c-*[0o+b]. 

We  define  the  asymptotic  bias  6^  =  6^  ( T,  // )  of  T  at  f/  so  that  it  is  invariant 
under  regression  equivariant  transformations: 


=  (T(//)-e)'  A(T(//)-0). 


(2.2) 


Therefore,  we  can  assume  without  loss  of  generality,  that  Gq  is  spherical,  i.e.,  A  is 
the  identity  matrix,  and  that  Gq  =  0 .  Accordingly,  the  nominal  model  (2.1)  becomes 

//o(y,x)  =  Go(||x|l)Fo(y)  (2.3) 

and  correspondingly  the  asymptotic  bias  of  T  at  //  is  given  by  the  Euclidean  norm  of  T : 

=  |1T(//)||.  (2.4) 

If  the  operator  T  is  continuous  at  H,  then  T(//)  is  the  asymptotic  value  of  the 
estimate  when  the  underlying  distribution  of  the  sample  is  H .  It  is  assumed  that  T  is 
asymptotically  unbiased  at  the  nominal  model  Hq  : 

T(//o)  =  0.  (2.5) 

We  will  work  the  e -contamination  neighborhood  of  the  fixed  nominal  distribution  Hq 

V,  =  {// :  //  =  (l-e)//o  +  e//*}  .  (2.6) 

where  H*  is  any  arbitrary  distribution  on  .  The  maximum  asymptotic  bias  of  T 

over  Vg  is 

fi(T)  =  sup  {11T(//)||:  //e  V,  },  (2.7) 


2.2  M-estimates  of  scale. 

Let  p  be  a  real-valued  function  on  R*  satisfying  the  following  assumptions. 

Al.  (i)  symmetric  and  non-decreasing  on  [0, «»),  with  p(0)  =  0 

(ii)  bounded,  with  lim^_^^  p(u)=l 

(iii)  p  has  only  a  finite  number  of  discontinuities. 

Let  0  <  ^  <  1 ,  then  given  a  distribution  function  F  ,  the  M-scale  functional  is  defined 


(see  Huber,  1981)  as 


Given  a  sample  u=(up...,  u^)  from  F,  the  corresponding  M-cstimate  of  scale  is 
obtained  from  (2.8)  by  replacing  F  by  the  empirical  distribution  F^  of  u . 

It  is  easy  to  prove  that 

s(F)  >0  iff  P^(u=0)  <  1-b.  (2.9) 

If  this  condition  is  satisfied  with  s(F)  finite,  and  p  is  continuous,  we  can  replace  the 
inequality  by  equality  in  (2.8). 

It  can  be  shown  that  the  breakdown  point  due  to  implosion,  i.e.,  due  to  contamination  at 
the  origin  which  results  in  s(F)  =  0,  is  1-b,  and  the  breakdown  point  due  to  explosion, 
i.e.,  due  to  contamination  tending  to  infinity  which  results  in  ^(F)=<»,  is  b  .  The  overall 
breakdo  An  point  is  then  e*  =  min  {b,  1-b}.  For  details  sec  Huber  (1981). 

In  the  case  where  one  is  interested  in  estimating  scale  for  its  own  sake,  one  usually 
forces  consistency  at  a  nominal  model  Fq  by  setting  b  =  p(u ) .  This  issue  turns  out  to 

be  irrelevant  for  our  present  purposes,  since  as  we  sec  in  the  next  subsection,  we  will  only  be 
interested  in  obtaining  a  smallest  M-estimate  of  scale  with  respect  to  the  regression 
parameter  0  in  a  particular  parametrization  of  the  scale  functional.  The  choice  of  b  will 
therefore  remain  at  our  disposal  in  obtaining  a  min-max  bias  regression  estimate. 

2.3  S-estimators  of  regression  for  general  H 

Let  (y,  x)e  be  a  random  vector  with  arbitrary  distribution  function  H  ,  e.g., 

H  could  be  the  empirical  distribution  function  for  (>’,x).  For  any  0eR  let  Hq  be  the 
distribution  of  the  residuals 

r(0)  =  y  -x'0  .  (2.11) 

Let  s(F)  be  any  M-estimate  of  scale  as  defined  in  Section  2.3,  and  to  emphasize  the 
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independent  roles  of  0  and  H  in  determining  //q  ,  let  s{H ^) . 

A  functional  T(//)  is  said  to  be  an  S-estimate  functional  of  regression  (see  Rousseeuw 
and  Yohai,  1984)  if  there  exists  a  sequence  0  €  such  that 


e,  =  T{H) 


(2.12) 


lim  s(Q„,H)  =  inf 


(2.13) 


With  regard  to  the  existence  of  such  a  sequence,  we  assert: 

If  p  satisfies  A  1  and  H  satisfies 

sup  ^ 

11011=  1 

then  there  exists  some  T(//)  satisfying  (2.12)  and  (2.13). 


(2.14) 


This  is  a  consequence  of  the  following  Lemma. 

Lemma  2.1.  Suppose  that  p  satisfies  A/  and  H  satisfies  (2.14).  Then  ||0„I1— >0° 
implies  lim  5  (  0^  ,//)  =  «= . 


Proof  of  Lemma  2 . 1 


Suppose  that  H  0_  ll  — ♦  and  let  0  *  =  — — 

"  110  I 


.  Without  loss  of  generality  we  can 


assume  that  0*  -4  0*  with  H  0*  ||  =  1 .  To  prove  the  lemma  it  is  enough  to  show  that  for 
all  ^  >  0 


Eu  \  p([y  ~x'0_]/ j)  V  >  0  . 


I 


Indeed,  we  can  write 


where  is  the  indicator  of  the  set  A  .  Since  it  is  immediate  to  prove  that 
P((>’-I10„II*'0*/^)  Ve;>0)  “^W>0)  foUows  from  the 

dominated  convergence  theorem  and  (2. 13).  D 

It  is  easy  to  prove  that  if  A1  is  satisfied  and  p  is  continuous,  then  (2.12)  and  (2.13)  will 
imply 

=  min{^(e,//):  06  }  (2.15) 

However,  in  general  (2.15)  may  not  be  true. 

Observe  that  there  may  be  more  than  one  value  T{H)  satisfying  (2.12)  and  (2.13).  In 
that  case  the  choice  of  T(//)  is  arbitrary. 


3.  MAXIMUM  BIAS  OF  S-ESTIMATES 


3.1  Manmum  bias  of  general  S-estimates 

Assume  now  the  target  model  Hq  is  given  by  (2.4).  We  will  need  the  following 
assumption. 

A2.  Fq  is  absolutely  continuous  with  density  /q  which  is  symmetric,  continuous  and 
strictly  decreasing  for  u  2: 0 . 

A3.  Gq  is  spherical  and  Pq^  (  x'0  =  0)  =  0  -v  0  e  with  0  ^  o . 

Under  A3,  it  is  easy  to  sec  that  the  distribution  of  x'0  depends  only  on  ||0|| .  Thus  we  set 


g  (^.ll®ll) 


=  ^WoP 


y-x'0 

s 


(3.1) 


The  following  Lemma  is  a  key  result 


Lemma  3.1.  Assume  that  p  satisfies  Al,  Fq  satisfies  A2  and  Gq  satisfies  AJ.  Then  g  is 
continuous,  strictly  increasing  with  respect  to  ||0l|  and  strictly  decreasing  in  s  for  r  >  0 . 


Proof  of  Lemma  3.1 .  Continuity  of  g  follows  from  A  l(iii)  and  A2:  since  p  is  continuous 
a.s.  [Fq  ],  the  expectation  of  p(y  -  x'0)  with  respect  to  Fq  is  a  continuous  function  of 


x'0 

^  P 


(sec  for  example  Billingsley,  p.  181).  Let 
(y-x'0)/s2j  a.s.  [//q].  we  have  £,/^P  |^(y -x'0)/ r. 


•  -v 

Since  p  (y-x'0)/rj 

Ik  « 


(y  -  x'0)  /  52 


In  addition,  we  have  strict  inequality  unless  (y  -  x'0)/  5j  =  (y-x'0)/52  a.s.  [//g],  that  is 
unless  y-x'0  =  O  a.s.  [//g]-  The  last  is  impossible  because  of  independence  of  >•  and  x. 
By  A3,  the  distribution  of  x'a  is  the  same  for  any  unit  vector  a .  Thus  the  distribution  of 
x'0  is  the  same  as  that  of  ||0||  z  ,  where  r  is  a  random  variable  distributed  as  x'a ,  ||a||  =  1 
Assume  without  loss  of  generality  that  s  =  1  and  let  t2  >  ^0.  Since  y  is  symmetric 
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about  0  and  independent  of  z  ,  the  conditional  expectation  g(t,z)  =  £[  p(>’  -rr)  |  z]  is 
a  non-decreasing  function  of  |  r  |  ,  Hence 

and  equality  holds  only  if  /[Z  =  almost  surely,  that  is  only  if  ?  =  0  almost  surely. 
The  last  is  impossible  because  of  A3.  D 

From  Lemma  3.1  it  is  immediate  that  an  S-estimatc  T(//)  is  Fisher  consistent  at  the 
target  model  Hq  . 

Let  gi"  *  ( • ,  ||0|| )  be  the  inverse  of  g  with  respect  to  s  and  g2~^(s>‘)  the 
inverse  of  g  with  respect  to  i|0|l .  The  following  theorem  gives  the  maximum  bias  of  an  S- 
estimate. 

Theorem 3.1.  Under  the  same  assumptions  as  in  Lemma  3.1,  the  maximum  bias  B(T)  of 
an  S-estimate  T  over  the  contamination  neighborhood  is  given  by 

>0), if  e  <min(h,  1-h) 

B(T)=-  (3.2) 

oo  if  e ^  min  (f>,  \-b) 

Therefore  the  asymptotic  breakdown  point  of  T  is  e  =  min  ( fe ,  1  -  6 ) . 

"  r  ' 

Proof.  Lctc=g2~*  ^i~*  ,  -  -  -  ,  and  suppose  that  e  <  min  (6,  1 -h) .  To 

b  V  J  4 

prove  that 

5  (T)  $  c  (3.3) 

it  is  enough  to  show  that  for  any  H  of  the  form  H  =  ,  ||0||>c  implies 


s(Q,H)  >  s{Q,H). 


(3.4) 
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Put  (-7^^,  0)  .  Then  by  Lemma  3.1,  |l6||>c  implies  that 

*  1  —  e 

*(j, ,11611)  > 

Also  by  Lemma  3.1,  there  exists  ^ 2  ^  i 

« (ii.lieii)  >  . 

I  -e 

Then 


^  (l-e)g  (sj.llOll)  >  ^ 


(3.5) 


(3.6) 


and  therefore  by  definition  of  s(Q,H)^s(H^)  (see  (2.8))  we  have 

52  ^  (3.7) 

On  the  other  hand 

^(.r,,0)  =  .  (3.8) 

Combining  (3.8)  and  Lemma  3.1,  we  have  for  any  H  =  (l-e)//Q+e//*  and  any  5  >  jj  : 


^  (l-e)g  (5,0)4  6  ^  ( 1 -e)g(5i,0)  +  e 


b  . 


Therefore  s  S  5(0,//)  for  all  s  >  s^,  and  so 

5,  ^  5(0  //).  (3.9) 

Thus  (3.4)  follows  from  (3.7)  and  (3.9),  and  so  (3.3  )  holds.  Now  we  will  prove  that 

5  (T)  c  .  (3.10) 

Let  Cj  be  any  positive  number  smaller  than  c  and  let  ||0‘*||  =  Cj.  Let  H*  be  the 
distribution  concentrated  at  the  point  mass  (yn.^^)  where  ^ind 

y^  =  8*  .  Set  //^  =  ( 1  -  e)//o  +  e  H*  .  In  order  to  prove  (3.10)  it  is  enough  to  show 


(3.11) 


sup  l|T(//„)l|  ^  Cj  . 

n 

Suppose  (3.11)  is  not  true.  Then  by  passing  to  a  subsequence,  which  we  continue  to  label 
,  we  have  T(  ,  with 

Urn  0,  =  e  (3.12) 

and 

ll^ll  <  l|e*ll  =  Cl  .  (3.13) 

It  follows  that 

Then  since 

=  (l-e)g(5,||0J|)  +  ep  ^  .  (3.14) 

S 

•  ^ 

letting  j  <  j ,  =  g *  (4~— ,  0)  and  using  Lemma  3.1  gives 
*  ‘  1  — e 

r  y  -  x'0„  ] 

^  (l-e)g(j,0)  +  e  >  (l-e)g(5i,0)  +  e 

•  s 

»  -4 

=  (l-e)^  +  e  =  b. 

This  implies  that 

lira  ^  s  ^  s<s, 

and  so  we  have 

^  •  (3.15) 


On  the  other  hand 


( 1 -e)g(ip  cj)  <  (l-e)g(^i,c)  =  b 


and  by  Lemma  3.1  we  can  find  ij  <  such  that 

(l-£)g(S2,C^)  <  b 

This  gives 

y  -  x^6* 


£//  P 


=  ( l-e)g(52.  Cj)  <  b 


and 


5(e*,//J  ^  ^2  (3-16) 

Since  (3.15)  and  (3.16)  contradict  the  fact  that  T(H, )  =  6^  minimizes  s  {■  ,H^)  for  each 
n  ,  we  have  established  (3.10).  In  order  to  complete  the  proof  it  is  enough  to  show  that  if 
e  T  min (b,  I -b),  then 


8l 


1 


„  -1 

p  •> 

0 

b 

1-e  ’ 

1  -e 

OO 


(3.17) 


Let  b  ^  0.5 ,  so  that  min  (,b,l-b)  =  b  .  Then  we  have 

b  -e 


-1 


1-e 


,0 


=  lim  gf'  (5,0)  =  oo 
0  — »0 


and  so 


‘■“t.  *2'‘  »r‘ 


Tfc 


b -e 


1  -e 


,0 


1  -e 


=  lint  g2‘‘  (-^>7^)  = 

/  T  oo  1^0 


OO 


□ 


3.2  Maximum  bias  of  S-estimates  for  (y,x)  multivariate  normal. 

If  z=(y,x)  -  yV(0,  Ip^j),then 

i 

g(s,y)  =  /i((l+Y^)Vi) 


where  h(K)=  E  P(Xm),  with  u  ~N(0,  1).  Then 


and 


‘  (i,y)  = 


h-Ht) 


82  '  (■s.O  =  ([^  *  '(Op-l)  *  • 

This  gives  the  following  expression  for  squared  bias; 


h-H 


1  -e 


■) 


1-e  ^ 


-  1  . 


(3.18) 


3.3  Maximum  bias  of  S*estiniates  when  p  is  a  jump  function. 

Consider  the  special  family  of  jump  functions  p  (which  satisfy  Al): 


Pc(«)  =  i 


0  if  I  M  I  <  c 

1  if  I  M  I  ^  c  . 


(3.19) 


Given  a  sample  u  =  ( Uj , . . . ,  «„ ) ,  the  corresponding  M-estimatc  of  scale  is  given  by 


s^iu)  = 


(«-[nfcl) 


where  are  the  order  statistics  for  the  absolute  values 

I  «i  I  ,  .  .  . ,  |u^  I  . 

For  the  choice  p^  ,  the  corresponding  regression  S-cstimate  minimizes  the  absolute 
value  of  the  (approximate)  \-b  quantile  of  the  absolute  values  |y,-x'6|  of  the 
residuals.  Note  that  this  regression  S-estimatc  does  not  depend  upon  the  choice  of  c  ,  and  so 
we  henceforth  set  c  =  1 . 


When  h  =  .5 ,  (u)  =  I  «  I  is  the  median  absolute  value  (MAV)  estimate  of 

((yl) 

scale.  The  coiresponding  S-esdmate  is  identical  to  Rousseeuw’s  (1984)  least  median  of 
squared  residuals  (LMS)  regression  estimate.  (Minimization  with  respect  to  6  of  a  quantile 
of  any  monotone  transformation  of  the  absolute  values  1  y,  -  x'  0 1  results  in  the  same 
estimate.) 

The  following  Lemma  gives  the  maximum  bias  of  an  S-esdmate  when  p  =  Pi . 


Lemma  3.3.  Let  T^  be  the  S-esdmate  with  jump  funcdon  Pj  and  right  hand  side  b  . 
Assume  Fq  sadsfies  A2  and  Gq  sadsfies  A3.  Then 


(i) 


B(Th)  ^  G 


-1 
:  -I 


(1- 


b  -e 
2(1-0 


)  ^  1-e^ 


where 


(ii)  inf 


£  <6  < l-e 


c,(iieii)  =  i-£GFo(f+x'e)  +  £co^o(-^+*'0) 

fl(T,) 


(3.20) 


=  inf 


V 


1 


2(1-0 


-1 


<t  <  — 


2(l-£o(0)  + 


1  -e 


(3.21) 


Proof.  In  this  case  we  have 


g(r,||e||)  =  P{\y-x'B\  is) 


We  also  have 


^(5,0)  =  2(  l-Fo(j)) 


(3.23) 


Using  (3.20)  and  (3.23)  in  (3.3)  gives  (i).  The  result  (ii)  is  obtained  by  substituting 


In  the  case  that  (y,x)  is  multivariate  normal,  using  (3. 18)  and  the  fact  that  for  p  = 

h(X)  =  2(1-<D(y)) 


we  get 


bH\)  = 


0-1(1-  ) 
^  2(l-e)'^ 

0-1(1 - * - ) 

2(l-e)^ 


(3.24) 


where  <1>  is  the  N(Q,i)  distribution  function. 

It  is  interesting  to  note  from  (3.3)  that  two  distinct  values  of  b  give  rise  to  any  specified 
breakdown  point  e*  €  (0,  .5),  namely  b  =  e*  and  b  =  1-e*  .  The  estimates  for 
two  such  values  of  b  have  different  maximal  bias  curves  (i.e.,  plots  of  B  ( )  =  fi  ( ,  e) 
versus  e  ),  both  of  which  explode  at  e*  .  In  Figure  1  we  display  two  such  curves,  with  bias 
as  a  function  of  e  given  by  (3.24)  for  the  values  b  =  .15  and  b  =  .85 ,  which  corresponds 
to  a  breakdown  point  e*  =  .15.  The  breakdown  at  e*  =  .15  is  due  to  implosion  for 
b  =  .85  and  due  to  explosion  for  6  =  .15  (cf.,  comments  in  Section  2.2). 


4.  M-ESTIMATES  WITH  GENERAL  SCALE 


4.1  Definition  of  M- Estimates  with  general  scale 

Let  p  be  a  function  satisfying  A1  and  let  s^H)  be  a  (very)  general  estimate  of  the 
residuals  scale.  For  example,  the  general  scale  functional  s{H)  may  be  determined 
simultaneously  with  0 ,  or  independently  of  0 .  It  is  assumed  that  s{H)  is  regression 
invariant  (i.e.,  invariant  under  regression  transformations  y  =  y  +x'b  and  x  =  C  x ),  and 
residuals  scale  equivariant  (i.e.,  equivariant  under  residuals  scale  change  u  =  au). 
Furthermore,  we  will  assume  that  r  (//)  has  a  breakdown  point  greater  than  e  ,  namely 

A4.  5i  =  inf{r(//):  //  =  (l-e)//o  +  e//*}  >  0 

^2  =  sup{r(//):  H  =  il-e)HQ  +  z  H*  }  <  <»  . 

Then  an  M-estimator  T(  // )  of  regression,  with  general  scale,  is  determined  by  solving 
the  minimization  problem 


inf®  E„  p 


y-x^0 


s{H) 


(4.1) 


Under  the  assumptions  on  r  (// ) ,  T(// )  is  clearly  regression  equivariant. 

If  the  infimum  in  (4.1)  is  attained  then  it  defines  T(//),  with  the  choice  of  T(//) 
arbitrary  in  the  case  of  non-uniqueness.  If  a  value  0  which  attains  (4.1)  does  not  exist,  then 
T(//)  is  defined  by 


T(//)  =  lim  0„ 

'  '  n  -*oo  n 


(4.2) 


where  0^  satisfy 


lim  Eu  p 


siH) 


=  inf, 


eeR' 


P 


y  -x^0 
s{H) 


(4.3) 


Again,  in  the  case  of  non-uniqueness,  the  choice  of  T(//)  is  arbitrary.  It  is  easy  to  check 
that  S -estimates  are  special  types  of  M-estimatcs  with  general  scale  (see  Rousseeuw  and 


Yohai,  1984),  as  are  Huber  (1971,  1981)  "proposal  2"  simultaneous  M-estimates  of 
regression  and  scale. 

4.2  Lower  bound  for  the  minimax  bias  of  M-estimators. 

Let  g(.r,||0||)  be  as  in  (3.1)  and  put 

=  82~^  J,g(5,0)+Y^  (4.4) 

k  s 

The  following  lemma  shows  that  .4  ^  is  in  fact  a  lower  bound  for  the  maximum  bias  over 
Vj  of  an  M-estimate  with  general  scale. 

Lemma  4.1.  Let  T  be  an  M-estimate  with  general  scale.  Assume  p  satisfies  Al,  Fq 
satisfies  42,  Gq  satisfies  4i,  and  the  scale  siH)  satisfies  44.  Then 

B(T)  ^  4p 

Proof.  Let  fi  =  S  ( T ) ,  suppose  that  B  <  4p  ,  and  take  y  >  0  such  that 

S  ^  4p  -  y  .  (4.6) 

Also  take  6  such  that 

S  ll«ll  S  Ap-X  .  (4.7) 

Let  H*  be  the  distribution  corresponding  to  a  point  mass  at  (y,  ,  x,  )  where 
v,  -x'  0=O  and  :^  =0X'  with  X,  — ><».  Put //^  =  ( 1 -e)//o  +  e //*  and 

e,*  =  T(//,.)  .  (4.8) 


If  0  *  is  unbounded,  (4.6)  is  contradicted  and  the  theorem  is  proved.  Assume  0*  IS 
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bounded,  and  then  we  may  also  assume  that  6  *  0*  .  By  A4  we  may  assume  that 

=  siH  s  >  0 .  According  to  (4.6)  we  have 

110*11  ^  Ap-Y  .  (4.9) 


L,  (0)  -  £//  p 


y  -  x0 


Then  by  Lemma  (3.1)  we  have 


.  fvl  L  lioil'-ero 

L,(0*)^  (l-e)£//,p  j-  +ep  X, - - 


Since  (4.7)  and  (4.9)  imply 


|X,(|!0|p-0*0)| 


we  have 


IimL,(0.*)^  (l-e)Eff^p 


=  ( l-e)g(5,0)  +  e  . 


We  also  have 


L,.(6)=  (l-e)£;,,p  —  =  (1-E)g(5..0) 


and  then  by  Lemma  3.1  we  have 


limL,(6)^  (l-e)g(^,6) 


Since  L,  (0  *  )  ^  (6)  we  also  have 
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(l-e)^(j,0) -I- e  s  (l-e)^(5,0) 


Therefore  by  Lemma  (3.1)  we  have 


lietl  ^  g2~^  s,g(s,0)  +  j^^ 


=  Ao(s)  ^  <4, 


and  this  contradicts  (4.7),  □ 


4.3  Optimality  of  S-estimates  with  jump  function  p . 

From  now  on  it  will  be  convenient  to  show  explicitly  that  g  depends  on  p ,  and  so  we 
will  write  ^ p  (  ^ .  I|0||)  •  For  r  e  R  and  j  >  0  define 


hJs,t)  =  EeO 


We  will  need  the  following  assumption. 


A2*.  Fq  has  a  density  /«  satisfying  A2,  and  for  r  >  0  and  y  >  0 


a(y)  = 


fpiy  '^0^^/o(y  -O 
/oCy) 


is  a  non-decreasing  function  of  y  . 

A2*  is  satisfied  for  example  in  the  important  case  where  Fq  is  the  Gaussian 
2 

distribution  N  ( 0,  o  ) .  This  follows  because  in  the  Gaussian  case  we  have 


a(y)  = 


/o(y  -*■t)^^/o(y  -t) 

/oCy) 


=  2e  cosh  , 

o 


and 
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a'(y) 


-2 

e  °  sinh 


J2L 

a 

V.  J 


>  0 


if  f  >  0  and  y  >  0  . 


A2*  evidently  holds  in  a  number  of  other  interesting  situations  -  for  example  it  is  easy 
to  verify  A2*  when  Fq  is  double  exponential. 

The  following  Lemmas  will  show  that  A^  is  minimized  when  p  is  a  jump  function. 
This  win  enable  us  to  compute  the  minimum  of  .4  ^  . 


Lemma  4.2.  Assume  p  satisfies  A/  and  Fq  satisfies  A2*.  Let  ^>0  and  show  that  the 
jump  function  p^  satisfies 


Then 


hp  {s,0)  =  hp(j,0)  .  (4.10) 

hp^(s,t)  S  Ap(5 ,  r)  -v  f  6  R  . 


Proof. 


fC  f 

hpjs,n-  hp(i.r)  =  -J  p(y)  j^/Q(sy  +t)-h/Q(sy  -t) 


dy 


+  (l-pCy))  /oC^y +t)  +  /o(5>' -0 


d\ 


= 


.  .  /o(Jc +0  +  /o(Jc -r) 

With  k  =  — - - - 

fo^sc) 


A2*  gives 


h  ^  Iq  P(>')/o('Sy)‘^ 
/j  ^  PCy)/o(0')  • 


-  22- 


Thus  (4.10)  gives 

hpis,X)-h^{s,X)^k  pCy)/o(iy)  +5  ( i-pCy))/o('Sy)^ 


^  0  .  □ 


Lemma  4.3.  Assume  p  satisfies  Al,  Fq  satisfies  A2*  and  Gq  satisfies  Ai.  Then  for  any 
^  >  0  there  exists  a  jump  function  p^  such  that 


(i) 

(ii) 


=  Sp(i.O) 
gp^(^,r)  S  gAs,t) 


^  t  e  B.  . 


Proof:  Follows  from  Lemma  4.2  conditioning  on  x .  D 


Lemma  4.4:  Assume  p  satisfies  Ai,  Fq  satisfies  A2*  and  Gq  satisfies  Ai.  Then 


(i)  Ap(j)  ^  inf^  Ap^ 


(ii) 


2il~FQ(sc))-i--^ 
1  -e 


where  G,  ( X)  is  defined  in  (3.20). 


Proof: 

(i)  Follows  immediately  from  Lemma  4.3. 

(ii)  Follows  from  the  definition  of  A  ^  (^ ) ,  (3.22)  and  (3.23).  □ 

The  following  theorem,  together  with  Lemma  3.3  (ii),  shows  that  an  S-estimator  with  a 
jump  function  p,  minimizes  the  maximum  bias  over  the  class  of  M-cstimates. 


Theorem  4.1 .  Let  T  be  an  M-estimate  and  assume /i/, -42*.  A-?,  and  ^44.  Then 


r  ^ 


B  (T)  >  inf 

f  1  1 

G,-‘ 

2(l-Fo(r))  +  ^ 

Fo-' 

- 1 - 

<  f  <  oo 

1  -e 

^  J 

Proof:  The  theorem  follows  from  Lemma  4.4  since  G, 

<  1  and  this  is  equivalent  to 


'  i2(l-Fo(r))  +  y; 


defined  when  2  [  1  -  Fo(r )  ]  +  ^ 


I  -e 


5.  GM-ESTIMATES 


5.1  Characterizing  the  Bias  of  GM-Estimates 

We  now  consider  GM-estimates  of  regression  T  =  T(H)  obtained  by  solving 

£;/Tl(>'-x'e,|N|)^  =  0  (5.1) 

11x11 

for  0  .  The  following  assumptions  will  be  used. 

A5.  ri  ( u ,  V )  is 

(i)  continuous, 

(ii)  odd,  and  monotone  non-decreasing  in  u  , 

(iii)  bounded,  with  supT](M,  v)=  I . 

U  ,  V 

Observe  that  the  optimal  bounded  influence  estimates  obtained  by  Krasker  (1980)  and 
Krasker  and  Welsch  (1982)  of  this  form  with  n(“.  v  )  =  (uv )  in  the  Huber  family 

V/^(u)  =  sign  (m)  max  (c,  I  u  I  )  .  (5.2) 

The  following  lemma  characterizes  the  possible  biases  of  GM-estimates  when 
H  e  . 

Lemma  5.1.  Assume  that  t]  satisfies  A5  and  Fq  satisfies  A2.  Let  T{H)  be  the  GM- 
estimator  defined  by  (5.1).  Then  there  exists  //^  =  ( 1 -e)f/Q -t-e // *  such  that 
T(/y^  )  -♦  0  if  and  only  if 

l|£,;,n(>-x'e,l|x||)  ^  II  <  .  (5.3) 

Ii  X  II  It 

Pri)(>f:  If  there  exists  an  H  e  such  that  T(//)=  0,  then  (5.3)  follows  immediately 
from  (5.1).  Suppose  now  that  (5.3)  is  satisfied  with  strict  inequality.  We  will  show  that  in 


this  case  there  exists  u  ,  v  with  v  >  0  such  that 


ri  (w,  V)  =  ||wi 


1  -e 


where  w  =  r]  (  y -x'0,  l|xl| )  - — Take  as  H*  the  distribution  with  point  mass  at 


=  -  V  ||w||  ,  Vq  =  u  +  x'0^  .  Then  if  H  =  (\  -£)HQ+e  H*  ,  we  have 

1  -e 


n(  y -x'0,  l|x|| )  ^—  =  (l-e)w  +  E||w| 


—  V  w 


V  w 


=  0  .  □ 


5.2  Optimality  of  the  sign  function  tIq  . 

Consider  the  GM-estimate  based  on  the  "sign"  function  t1q(  u ,  v)  =  sgn  {u  ) : 


E  sgn  (y  -x'Q)  ^  =  0 

11x11 


(5.4) 


The  solution  Q(H)  of  (5.4)  minimizes 


f  •  (5.5) 

l|x|l 

Thus  the  estimate  is  a  weighted  LI  estimate  with  weights  Hxjl"'  for  a  finite  sample 

( ,  Xj ) ,  i  =  1 . n  .  In  the  case  of  p  =  1  it  is  easy  to  see  that  the  estimate  is  the 

median  of  the  slopes: 


(5.6) 


We  shall  now  show  that  the  choice  minimizes  the  maximum  bias  over  .  We  need 
the  following  Lemma. 


Lemma  5.2.  Assume  V|/:IR— »R  is  (a)  odd,  (b)  monotone  non-decreasing  and 
(c)  sup  v=  1  .  It  follows  that; 

(i)  (f )  = +  0  is  monotone  non-decreasing. 

(ii)  If  Fq  is  symmetric,  then  q^{t)t>0  and  ^7^ (“f )  = ) . 

(hi)  If  Fq  satisfies  A2,  then  |<7^(r)l  ^  where  %(M)  =  sgn(u). 

Proof. 

(i)  Let  t2>  ,  then 

by  property  (b)  of  y . 

(ii)  Since  <7^^(0)  =  0,  (i)  gives  q^{t)t>0.  On  the  other  hand 

-O  =  Ep^\^(y-t)  =  Ep^\\f{-y-t) 

=  -^FoVCy+f)  = 

(iii)  By  (ii)  we  can  assume  r  >  0 ,  and  therefore 

"  Jo  -0-/(y +0] 

Since  \|/ (>■)<;  I  and  f  iy  -t)^f  (y  +t)  for  y^O,  we  have 

<7y(o  ^  4 

=  □ 

Now  we  can  prove  that  r|g  is  optimal. 

Theorem  5.1.  Suppose  that  r|  satisfies  A5,  Fq  satisfies  A2  and  Gq  satisfies  Ai.  Let  T  be 
the  G.Vl -estimate  based  on  rj  and  Tq  be  the  GM-estimate  based  on  rig  ,  then 


fi(T,  e  )  >  B(Tg,e) 


Proof:  Let 

:^(||0|1)=  lle^^riCy -x'0.  !lx||)-^  II  .  (57) 

A3  implies  that  the  nght  hand  side  expectation  depends  only  on  liG!] .  Then  according  to 
Lemma  5. 1  it  is  enough  to  show  that 

■^(lieil)  S  r^^dieil)  ,  (5  8) 

Setting  0  =  X.  (  1,  0.  .  .  .  ,  0 )'  for  X.  >  0  without  loss  of  generality,  we  have 

llxll) 

l|x|| 

Taking  conditional  expectation  with  respect  to  x  in  (5.9)  we  get 

t*(k,\)=EH  ■n(>--X.Xj,[lx(|)^|x  =  Tl(y -Xx,,  ((x|()  —  (5,10) 

'  l|x||  ,'Xl 

J 

and  therefore  by  Lemma  5.2,  putting  =  ||x||)  and  r=-X.,rj,  we  get 

->^^^1,11x11) <  E„  r[Qiy-Xxf\x\\)^.  (5.11) 

l|x||  11x11 

Then  (5. 9)-t5. 1 1 )  yield  (5.8).  D 

5.3  Optimatily  of  rig  among  all  Equivariant  Estimates  for  p  =  1 

So  far  we  have  obtained  mm-max  bias  robust  estimates  over  two  specific  classes  of 
equivariant  regression  estimates.  It  would  of  course  be  highly  desirable  to  obtain  a  min-max 
bias  solution  over  the  class  of  all  equivariant  reqgression  estimates.  Although  it  is  not  yet 
clear  how  to  obtain  such  an  estimate  for  general  p  ,  we  have  the  following  solution  for  the 
special  case  p  =  1 
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Theorem  5 .2.  For  the  model  (2.6)  with  /?  =  1  ,  the  median  of  the  slop>es  estimate  9^ 
given  by  (5.6)  minimizes  the  maximum  bias  among  all  regression  equivariant  estimates. 

Proof:  The  proof  follows  lines  quite  analogous  to  Huber’s  (1964)  proof  of  the  min-max  bias 
property  of  the  median  among  all  translation  equivariant  estimates. 


5.4  Computing  the  maximum  bias 

Lemma  5. 3.  Assume  q  satisfies  A5,  Fq  satisfies  A2  and  Gq  satisfies  Ai,  then  if  T  is  the 
GM-estimate  corresponding  to  r\  we  have 

(i)  ^Ti ^  monotone  non-decreasing  in  X  ; 


(ii)  fl(T)  =  r^-‘ 


Proof  According  to  Lemma  5.2,  r*  (X,  x)  defined  in  (5.7)  is  monotone  non-decreasing 
in  I  I  for  all  X .  Then  (i)  follows.  Use  of  (i)  and  Lemma  5. 1  gives  (ii).  □ 

We  will  compute  now  r_  (X),  when  y  and  x  are  normal.  From  (5.6)  we  have  for 

p  >  1 


r^^(X)  =  |£sign(y-Xx) - - — -\ 

(x‘-(-v)‘ 

where  y  ,  and  x  are  N{0,  1)  and  v  is  chi-square  with  (p  -1)  degrees  of  freedom. 
( Xp  _  1 ) .  X  .  X  ,  and  v  independent.  Then 

r  (X)  =  I  E  (2<t>(Xx)-l) - ^—1 


6.  INCLUDING  THE  INTERCEPT 


The  results  so  far  do  not  cover  the  case  of  a  regression  model  with  an  intercept  This  is 
because  they  were  obtained  under  the  assumptions  that  the  contamination  affects  all  the 
coordinates  of  x .  Nevertheless,  all  our  results  for  the  regression  parameter  remain 
unchanged  for  the  regression  model  with  intercept; 

y  =  a+x'6  +  u  (6.1) 

where  >  ,  x ,  0  and  u  are  as  before  and  a  is  the  intercept  parameter. 

Consider  the  following  class  of  S-estimates  of  (a,0):  Let  T*  be  any  location 
functional  defined  on  the  class  of  distribution  functions  on  R .  Given  a  p  function  as  in 
Section  2.1,  and  a  distribution  function  H  on  ,  we  define  an  S-estimatc  T(//)  of 

the  regression  parameter  as  the  vector  0  which  minimizes  the  scale  functional  s  ( ) , 
where  is  the  distribution  function  of  y  -  x'B-T*  and  where  is  the 

distribution  function  of  y  -x'0.  Now  one  naturally  takes  the  final  location  estimate  to  be 
T^C/Zk^)),  i.e.,  the  location  estimate  T*  applied  to  the  "residuals"  y  ~x'T(H).  This 
class  contains  as  a  particular  case  the  usual  S-estimatc  of  the  regression  and  intercept 
parameters,  simply  by  taking  T*  equal  to  the  corresponding  S-cstimate  of  location.  Similar 
extensions  are  possible  for  M  and  GM  estimates. 

Assume  now  that  T*  is  Fisher  consistent,  i.e.,  for  any  symmetric  distribution  F  on 
R ,  T*  (F )  =  0  and  has  breakdown  point  at  least  e  .  Then  it  can  be  shown  that  the  results  of 
Theorems  3.1, 4.1  and  5.1  still  hold  for  estimating  0  in  the  model  (6.1). 

It  remains  to  find  ( T,  T* ) ,  with  T  an  M-estimate  with  general  scale  (or  a  GM- 
estimate)  and  T*  a  location  estimate,  such  that  the  the  maximum  bias  of  the  intercept  is 
minimized.  We  conjecture  that  choosing  T*  to  be  the  median  and  T  the  corresponding 
min-max  bias  estimate  for  0  will  solve  this  problem. 


7.  COMPARING  MIN-MAX  BIAS  ESTIMATES 


The  result  of  solving  the  min-max  bias  problem  over  the  class  of  regression  M- 
estimates  with  general  scale  and  bounded  p,  yields  the  discontinuous  jump  function  p  ^ 
Consequently  the  S-esdmate  which  achieves  the  min-max  bias  docs  not  have  an  infxuence 

curve,  and  it  has  a  slower  rate  of  convergence  than  usual:  namely  ,  the  same  rate  of 

convergence  as  Rousseeuw’s  (1984)  least  median  squared  residuals  (LMS)  estimate.  This  is 
evidently  the  price  one  has  to  pay  when  one  wishes  to  control  bias  over  the  class  of  M- 
estimates  with  bounded  p .  On  the  other  hand,  the  min-max  bias  is  independent  of  the 
number  of  carriers,  p  . 

The  mm-max  bias  GM-estimate  of  Section  5  docs  have  a  bounded  influence  curve  (see 
Hampel  et.  al.,  1986),  and  enjoys  the  usual  rate  of  convergence  under  regularity  conditions. 
However,  its  bias  and  breakdown  point  depend  upon  the  dimensionality  p  of  the  carrier 
space  (see  Maronna,  Bustos  and  Yohai,  1979,  and  Maronna  and  Yohai,  1987a). 
Furthermore,  it  is  necessary  to  robustly  estimate  the  covariance  matrix  to  implement  the 
GM-estimatc,  and  this  is  not  necessary  for  the  S-cstimate. 

Nonetheless  one  wonders  how  the  two  min-max  estimates  compare  for  fractions  of 
contamination  smaller  than  their  breakdown  points.  First  some  computations  were  earned 
out  under  the  unrealistic  assumption  that  the  covariance  matrix  for  the  carriers  is  known. 
Figure  2  displays  the  resulting  bias  curves  of  the  min-max  GM-estimatc  p  =  1,  2,  3,  5.  10 
and  15  carriers,  along  with  the  bias  curves  of  the  min-max  S-cstimate  S*  and  the  maximal 
bias  curve  of  the  LMS  estimate  (these  latter  biases  being  independent  of  the  number  of 
earners  p ).  Several  observations  arc  immediate:  For  each  p  ^  2  the  optimal  GM-estimate 
has  significantly  smaller  bias  than  the  optimal  S-cstimate  for  fractions  of  contamination  not 
too  close  to  the  GM-estimate  breakdown  point.  Of  course,  as  e  approaches  the  breakdown 
point  of  a  GM-estimate  for  any  give  p  ,  the  S-estimatc  will  strongly  dominate  the  GM- 
estimate.  Also,  the  performance  of  LMS  (which  is  the  limiting  form  of  S*  as  e  — >  5  )  is 
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sufficicntly  close  to  that  of  S*  to  regard  it  as  an  "excellent"  approximation  to  a  min-max 
bias  solution  (this  is  very  similar  to  the  results  of  Martin  and  2^amar,  1987a,  who  show  that 
an  appropriately  scaled  median  is  an  excellent  approximation  to  the  min-max  bias  scale 
estimate  for  a  positive  random  variable). 

By  Theorem  5.2  the  optimal  GM-cstimate  0^  for  p  =  1  is  min-max  bias  optimal 
among  all  regression  equivariant  estimates  with  model  intercept  zero,  and  also  has 
breakdown  point  .  5 .  This  global  optimality  of  the  GM-estimate  and  its  actual  degree  of 
dominance  over  the  optimal  S-estimate  at  p  =  1  begs  the  following  important  question: 
Does  there  exist  a  min-max  bias  regression  estimate  among  the  class  of  all  regression 
equivariant  estimates  ? 

We  also  made  some  calculations  to  reveal  how  estimation  of  the  covariance  matrix 
inflates  the  min-max  biases  of  the  GM-estimates.  In  order  to  do  so  we  made  use  of  recent 
results  on  the  maximal  bias  of  covariance  estimates  due  to  Maronna  and  Yohai  (1987b).  The 
results  are  displayed  in  Table  1  for  the  case  of  the  covariance  matrix  estimate  studied  by 
Tyler  (1987).  Clearly,  the  price  of  estimating  covariance  can  be  high,  even  when  the  fraction 
of  contamination  is  far  from  the  breakdown  point  of  the  GM-estimate  with  known 
covariance.  Sec  for  example  the  e  =  .05 ,  p  =  15  and  e  =  .  2 ,  p  =  3  cases.  Of  course, 
the  smaller  breakdown  points  of  the  covariance  matrix  estimates  results  in  smaller 
breakdown  points  for  the  GM-estimates  with  estimated  covariance. 

The  gross-error-sensitively  (GES)  is  the  supremum  of  the  norm  of  the  influence  curve, 
and  it  is  a  measure  of  the  maximal  bias  caused  by  a  vanishingly  small  fraction  of 
contamination.  The  GES  is  the  derivative  of  the  maximal  bias  curve  at  e  =  0,  for  well- 
behaved  estimators  having  an  influence  curve  (which  LMS  and  S*  do  not!).  In  Figure  2, 
we  display  GES-based  linear  approximations  to  maximal  bias  for  the  optimal  GM-estimates 
for  p  =  1  and  p  =  10 .  The  GES  approximation  seems  rather  good  for  values  of  e  up  to 
say  40%  or  50%  of  the  breakdown  point.  This  is  in  agreement  with  Hampel’s  rule  of  thumb 


f  / 


(sec  Hampel  et  al.,  p  178). 


GM-estimates 

P 

e  = 

0.05 

e  = 

0.10 

e  = 

0.15 

e  = 

0.20 

1 

0.083 

0.18 

0.28 

0.41 

2 

0.11 

(.11)* 

0.25 

(.23) 

-t 

(-) 

0.68 

(.55) 

3 

0.12 

(.11) 

0.29 

(.25) 

- 

(-) 

1.39 

(.70) 

4 

0.15 

(.14) 

0.39 

(.31) 

- 

(-) 

OO 

(.82) 

5 

0.19 

(.17) 

0.49 

(.36) 

2.85 

(.59) 

OO 

(1.00) 

10 

0.31 

(.23) 

oo 

(.50) 

OO 

(  .97) 

oo 

(oo) 

15 

0.62 

(.29) 

oo 

(.68) 

OO 

(1.71) 

OO 

(oo) 

S-cstimates 


s* 

.49 

ni 

1.05 

1.37 

LMS 

.53 

.83 

1.07 

1.52 

Table  1.  Min«Max  Biases  of  Optimal  GM>estimates  with  Estimated 
Covariance  Matrix  and  Optimal  S<estimate$ 

*  Numbers  in  parentheses  arc  biases  with  covanance  known 
(i.e.,  they  correspond  to  points  on  the  curves  in  Figure  2) 
t  These  three  missing  values  were  not  computed  because  we 
did  not  have  available  the  corresponding  biases  for  the 
covariance  matrix  estimate.  We  hope  to  provide  the 
needed  computation  in  the  near  future. 


1*^  J  ■  -  a  M  R.  &  M  A  M  m  . 
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